Machine learning for the prediction of in-hospital mortality in patients with spontaneous intracerebral hemorrhage in intensive care unit

This study aimed to develop a machine learning (ML)-based tool for early and accurate prediction of in-hospital mortality risk in patients with spontaneous intracerebral hemorrhage (sICH) in the intensive care unit (ICU). We did a retrospective study in our study and identified cases of sICH from the MIMIC IV (n = 1486) and Zhejiang Hospital databases (n = 110). The model was constructed using features selected through LASSO regression. Among five well-known models, the selection of the best model was based on the area under the curve (AUC) in the validation cohort. We further analyzed calibration and decision curves to assess prediction results and visualized the impact of each variable on the model through SHapley Additive exPlanations. To facilitate accessibility, we also created a visual online calculation page for the model. The XGBoost exhibited high accuracy in both internal validation (AUC = 0.907) and external validation (AUC = 0.787) sets. Calibration curve and decision curve analyses showed that the model had no significant bias as well as being useful for supporting clinical decisions. XGBoost is an effective algorithm for predicting in-hospital mortality in patients with sICH, indicating its potential significance in the development of early warning systems.


Database and ethics
The Medical Information Mart for Intensive Care-IV (MIMIC-IV) is an open and freely accessible critical care database that contains comprehensive clinical data of patients admitted to a tertiary academic medical center in Boston, MA, USA, from 2008 to 2019.The database includes essential patient information, vital signs, laboratory indicators, treatment details, and survival data.The usage of data from MIMIC-IV has been granted approval by the Institutional Review Boards of Beth Israel Deaconess Medical Center (Boston, MA) and Massachusetts Institute of Technology (MIT; Cambridge, MA).As all personal data in this database is encrypted, informed consent was waived.One of the authors (Mao, Baojie) obtained access to the database and was responsible for data extraction (certification number 46148427).In addition, we recruited patients with cerebral hemorrhage who were admitted to ICU from December 2018 to February 2023 in Zhejiang Hospital.The study protocol was approved by the Ethics Review Committee of Zhejiang Hospital (No. 2023 Pro-examination (58 K)).All methods and procedures were carried out in accordance with the Declaration of Helsinki.All patient data were anonymized.No patient-identifiable data were recorded throughout the study.Given that this study was purely observational, written consent from patients was not required.

Data extraction and outcomes
Clinical and laboratory variables were meticulously collected within 24 h of admission to the Intensive Care Unit (ICU).In the case of variables with multiple measurements, mean values were calculated and utilized for analysis.A total of forty-six variables were included in the data collection process.These encompassed patient characteristics (age, gender), vital signs (respiratory rate, blood pressure, heart rate, oxygen saturation, and temperature), laboratory data (routine blood analysis, renal function, coagulation, and blood gases), as well as comorbidities identified based on recorded International Classification of Diseases ICD-9 and ICD-10 codes.The comorbidities considered were hypertension, diabetes mellitus, chronic obstructive pulmonary disease (COPD), congestive heart failure, renal disease, liver disease, and malignancy.Furthermore, information regarding the usage of anticoagulant and vasoactive drugs, surgical status, Glasgow Coma Scale (GCS), Sequential Organ Failure Assessment (SOFA) scores, mechanical ventilation, and renal replacement therapy (RRT) was gathered.Due to the limited number of patients with missing data, we opted to exclude them from the analysis rather than attempting to estimate the missing values.The primary endpoint was all-cause in-hospital mortality.

Cohort selection
1. Patients must be admitted to the ICU for the first time.

Feature selection
We applied Lasso regression, a regularization technique, on the preprocessed dataset.Lasso performs feature selection by shrinking the coefficients of less important features to zero, effectively eliminating them from the model.The optimal regularization parameter (λ) for Lasso was determined using the coordinate descent algorithm.Following Lasso regression, the variables were ranked based on their corresponding non-zero coefficients.
The final predictive model included the top 14 variables with the highest absolute coefficient values.www.nature.com/scientificreports/

Statistical analysis
The normality of the distribution was evaluated using the Kolmogorov-Smirnov test.Continuous variables were presented as mean with standard deviation if they followed a normal distribution, or as median with 25-75th percentile if they deviated from normality.The Student's t-test or Mann-Whitney test was applied accordingly to analyze the continuous variables.Categorical variables were presented as counts and percentages, and the chi-square test was utilized to compare the distributions.
In this study, we employed five different ML algorithms to develop models: Logistic regression (LR), K-nearest neighbors (KNN), Adaptive boosting (AdaBoost), Random forest (RF) and eXtreme Gradient Boosting algorithms (XGBoost).The MIMIC IV dataset was initially partitioned into a training set (70%) and an internal validation set (30%).Furthermore, we utilized the Zhejiang Hospital dataset as an external validation set.The validation process employed a bootstrap resampling technique with 1000 iterations to evaluate the model's performance.The area under the curve (AUC) and 95% confidence intervals (CI) were calculated.Furthermore, several evaluation metrics, including accuracy, sensitivity, specificity, Youden index, and F1 score, were computed.The performance of the model is assessed by conducting tenfold cross-validation and obtaining the average value.For hyperparameter selection, grid search methods were utilized.
To assess the performance and clinical applicability of the predictive model, we generated calibration curves and clinical decision curves.Calibration curves were used to evaluate the predictive accuracy and calibration of the model by comparing the predicted probabilities with actual observations.On the other hand, clinical decision curves were employed to determine the model's sensitivity and specificity at various decision thresholds, thus optimizing its predictive performance for clinical decision-making.After selecting the optimal model, we utilized the SHAP package in Python to demonstrate the importance of each feature.Subsequently, we developed a web-based visual interface using Streamlit to demonstrate the functionality of the selected machine learning model.Users can input relevant data parameters or upload datasets for real-time model evaluation.The model processes the input data, generating predictive outcomes based on the underlying learning patterns.
Statistical significance was set at P < 0.05, and all tests were two-tailed.Statistical analyses were performed using R software (version 4.3.1)or Python software (version 3.11).

Result Baseline characteristics
The present study involved a total of 1596 patients, including 1486 patients from the internal cohort extracted from the MIMIC-IV database and 110 patients from the external cohort extracted from the Zhejiang Hospital database.In the internal cohort, there were 349 in-hospital deaths (23.48%), whereas the external cohort had 18 in-hospital deaths (16.36%).Table 1 provides an overview of the baseline characteristics for both the internal and external cohorts.

Key variables
Within the training set, LASSO regression was applied for automated feature selection as illustrated in Fig. 2. Lasso regression is a method for regression analysis that reduces unnecessary model complexity by introducing a regularization term (λ) for variable selection and complexity adjustment.From the initial pool of 46 candidate variables, we identified the top 14 based on their importance and integrated them into the final model.The selected variables encompassed: use of anticoagulants, use of mannitol, use of vasoactive drugs, mechanical ventilation, temperature, surgical intervention, serum potassium, heart failure, blood oxygen saturation, SOFA, GCS, serum sodium, RDW and serum chloride.

Model performance
The discriminative ability of all models to predict mortality is shown in Fig. 3 and Table 2.In the training set, XGBoost, KNN, LR, RF, and AdaBoost models were built, and the AUCs of the internal validation set were 0. 907, 0.808, 0.851, 0.897, and 0.900, respectively.Note that the prediction performance of the XGBOOST model was the highest among these five models (AUC 0.907; 95% CI 0.875-0.939;accuracy: 0.874; sensitivity: 0.582).In the external validation set, the XGBoost model demonstrated predictive power with an AUC of 0.788, second only to the LR model, which achieved an AUC of 0.790.
Figure 4A shows the calibration plots for all five models.The calibration curve analysis showed that XGBoost was accurately calibrated in predicting the risk of in-hospital death, with no significant over or underestimation (Fig. 4B).In addition, the Decision Curve Analysis (DCA) for XGBoost has the highest net benefit across risk thresholds compared to all other models, as shown in Fig. 4C,D.
The importance of features derived from XGBoost model is shown in Fig. 5. GCS score was the most influential feature followed by SOFA score, use of anticoagulants, use of mannitol, oxygen saturation, body temperature, serum sodium, serum potassium, RDW, mechanical ventilation, heart failure, serum chloride, use of vasoactive drugs and surgical intervention.

Application of the model
Additionally, a web-based computational tool using the XGBoost algorithm model has been developed to enable clinicians in real-time prediction of the prognosis for patients with severe sICH.(accessible at https:// sich-mimic.strea mlit.app/).Figure 6 shows an example of using a real-time prediction tool for web pages.This example highlights the patient's heightened risk of in-hospital mortality and indicates that variables such as temperature, medications, and other factors serve as prognostic risk factors.Clinicians are advised to promptly regulate the patient's temperature, consider conservative anticoagulant therapy, and evaluate the discontinuation of mannitol if deemed feasible and suitable.

Discussion
In this retrospective study, we developed and validated a clinical feature-based machine learning algorithm for predicting in-hospital mortality in critically ill patients with sICH.Among the tested models, the XGBOOST model demonstrated the highest prediction performance.Through advanced machine learning techniques, we successfully identified several key clinical features strongly associated with in-hospital mortality, including GCS score, SOFA score, use of mannitol medication, use of anticoagulant medication, vital signs, serum electrolytes, RDW, among others.These findings are significant and warrant further investigation.Additionally, we have created an easy-to-use web-based calculator to assist clinicians in making informed decisions regarding further treatment.
Among various types of strokes, cerebral hemorrhage is characterized by a relatively high in-hospital mortality rate, especially in patients admitted to ICU.The in-hospital mortality rate of patients varies based on both the location and volume of the hematoma.Previous studies have reported an early mortality rate of 40% and a long-term mortality rate as high as 60% for sICH patients [16][17][18] .Marika Fallenius et al. analyzed patients admitted to the ICU with severe cerebral hemorrhage and found a mortality rate of 42% for supratentorial sICH patients and 49% for infratentorial sICH patients 19 .Additionally, researchers investigated a 30-day mortality rate of up to 54% for patients with severe sICH in the southern region of Spain, and this rate increased to 60% for patients with hematoma volumes exceeding 30 ml 20 .The mortality rate observed in our study was lower compared to the case-fatality rates reported in previous studies.This difference may be attributed to our exclusion of patients admitted for less than one day or those automatically discharged, as well as differences in medical conditions.
In this study, we employed five distinct ML methods to develop predictive models.The performance evaluation of these algorithms was based on six common metrics (AUC, F1 score, accuracy, sensitivity, specificity and Table 1.Demographic and Clinical Characteristics of Hospitalization Survival and Mortality Groups in MIMIC IV and Zhejiang Hospital Database.COPD chronic obstructive pulmonary disease, SBP systolic blood pressure, DBP diastolic blood pressure, MBP mean blood pressure, WBC white blood cell, RBC red blood cell, RDW red blood cell distribution width, BUN blood urea nitrogen, MCH mean corpuscular hemoglobin, MCHC mean corpuscular hemoglobin concentration, MCV mean corpuscular volume, INR international normalized, PT prothrombin time, PTT partial thromboplastin time, ratio, SOFA sequential organ failure assessment, GCS Glasgow Coma Scale, RRT renal replacement therapy.www.nature.com/scientificreports/Youden Index).Notably, the results unequivocally indicate that the XGBoost model exhibits the most superior performance and predictive stability, which contrasts with previous findings favoring the Random Forest model 21 .
XGBoost is an efficient, flexible, and scalable ML algorithm, renowned for its classification capabilities.To mitigate overfitting and optimize its performance, XGBoost employs techniques such as improved subsampling rates, learning rates, and maximum tree depth control 22 .Zhu et al. evaluated data from ICU patients who were intubated due to respiratory failure and received mechanical ventilation.They utilized seven learning algorithms to predict in-hospital mortality, with XGBoost demonstrating the best overall performance 23 .Similarly, Hu et al. incorporated data from 8817 sepsis patients into seven models to predict in-hospital mortality, and they also found that the XGBoost model exhibited the most effective predictive ability 24 .Despite the success of algorithms in this field, one of the current challenges lies in the need to interpret the "black box" of ML.Thus, we utilized the visualization function in SHAP to identify the impact of specific variable values on the model output.As anticipated, the GCS score takes the top position in the SHAP importance ranking.The GCS is a widely used scale for assessing the level of consciousness, with scores ranging from 3 to 15.Previous studies have consistently demonstrated the importance of the GCS score in evaluating the severity of neurological disorders 21,25,26 .The SOFA score serves as a valuable tool for quantifying the extent of organ dysfunction or failure at the point of ICU admission and has found widespread application in predicting in-hospital mortality in this setting [27][28][29] .It has been observed that the SOFA score exhibits superior predictive performance compared to other scoring systems when it comes to infection-related in-hospital mortality in ICU patients 30 .The use of anticoagulants in patients with cerebral hemorrhage and the timing of anticoagulant use remain controversial, and some studies have suggested that anticoagulants have a positive effect on patient prognosis.This might be because the use of anticoagulants in critically ill patients reduces complications such as thromboembolism and does not significantly increase bleeding complications 31 .Currently, the primary non-surgical treatment for  cerebral hemorrhage involves the use of drugs like mannitol to reduce intracranial pressure.However, our study revealed that the use of mannitol may lead to a poor prognosis for patients with this disease.According to current guidelines, hypertonic saline demonstrates superior efficacy in managing cerebral edema associated with cerebral hemorrhage compared to mannitol 32 .Mannitol, which can elevate the risk of intracranial hemorrhage, may be less preferable in such cases.Oximetry ranked fifth in importance in our model.However, contrary to clinical expectations, we observed that oxygen saturation was lower in survivors compared to non-survivors in the internal cohort (97.22% vs. 98.34%) at baseline.It is possible that over-oxygenating ICU patients within normal oxygen saturation levels can lead to unfavorable prognoses and more adverse outcomes 33 .Moreover, prior research has established that electrolyte disturbances represent an independent risk factor for an unfavorable prognosis in stroke patients [34][35][36] .Lastly, to our surprise, the study indicates that in critically ill patients, surgical treatment may not hold significant importance.The low rate of surgery in sICH in the internal MIMIC dataset (8.60-11.96%)may be due to the fact that we included both critically and mildly ill patients, leading to a weakening of the influence of surgical intervention as an important factor, as it is known that surgical treatment may be more effective in patients with high bleeding volumes.This study holds significant clinical and methodological implications.Firstly, we implemented an external validation set to mitigate the risk of model overfitting.Secondly, the model was developed using readily available data collected within 24 h of patient admission, enabling early and accurate mortality prediction.This provides clinicians with more time to adjust treatment strategies accordingly.Thirdly, the study sheds light on previously overlooked factors, such as anticoagulant use and RDW, which are now identifiable.Integrating these factors with machine learning methods enhances the predictive

Limitation
However, our study has several limitations that need to be acknowledged.Firstly, it was a retrospective and observational study, which may introduce certain research biases.Secondly, as our study was focused on patients with sICH, we did not include information on radiologic variables, such as hematoma volume or location, which could potentially provide additional insights into the disease.Thirdly, the complexity of the model with 14 inputs  may pose challenges for practical implementation in clinical settings, suggesting the necessity of integrating the algorithms with electronic medical record systems.Fourthly, variations in data collection methods between the open dataset and the local dataset may introduce realistic discrepancies.Lastly, the diversity of patient populations across different ICUs, as evidenced by differing death rates between hospitals, may impact the generalizability of the study findings, highlighting the importance of considering regional factors in result interpretation.It is possible that the local dataset represents a subgroup of critical care cases within the larger open dataset.

Conclusion
The XGBoost model demonstrated superior performance in predicting short-term mortality among sICH patients.Our findings indicate that factors such as GCS, SOFA score, mannitol use, anticoagulant use, oxygen saturation, time of ICU admission, temperature, serum sodium, mechanical ventilation, and serum potassium are strongly associated with in-hospital mortality in sICH patients.This newly developed risk model is expected to serve as a convenient tool for risk stratification.

2 .
Patients must have a confirmed diagnosis of sICH.3. Patients' age should fall within the range of 18-90 years.4. Patients must have an ICU length of stay exceeding 1 day. 5. Patients must have complete clinical data.The flowchart for patient recruitment is shown in Fig. 1. https://doi.org/10.1038/s41598-024-65128-8www.nature.com/scientificreports/

Figure 1 .
Figure 1.Model development process and flowchart of the study.

Figure 2 .
Figure 2. Demographic and clinical feature selection.The automated feature selection process for 46 clinical factors was executed utilizing the LASSO, aiming to minimize the binomial deviance loss function, shrink coefficients, and generate some zero coefficients to facilitate efficient feature selection (A).Subsequently, the algorithm identified and retained 14 filtered features with non-zero coefficients for integration into the model generation process (B).

Figure 3 .
Figure 3. Area under the receiver operating characteristic curve for machine learning models in the internal validation queue (A) and external validation queue (B).ROC receiver operate characteristics, CI confidence intervals.

Figure 4 .
Figure 4. Calibration plots of five ML models in the internal validation queue (A,B).Decision curve analysis for five ML models in the internal validation queue (C,D).

Figure 5 .
Figure 5. Scatter plot of variables for SHAP analysis (A) and importance ranking plot (B) of the XGBoost.A visual representation illustrates the importance of each feature in the XGBoost model, depicting the relationship between them.The color scale indicates the variable values, with red denoting higher values and blue indicating lower values.

Figure 6 .
Figure 6.Case of website usage.Enter input values to determine the prognosis for sICH, and show the contribution of the variable shap value to the prediction.An in-hospital mortality rate of 44.812% was predicted.Additionally, factors including body temperature, non-utilization of anticoagulants, and mannitol usage were linked to a unfavorable prognosis in patients with sICH.

Table 2 .
Predictive performance of machine learning models in internal and external validation sets.